Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time
Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
July 14 between 10:40 am - 2:00 pm EDT
A framework for decoding GWAS studies via deletion-interrupted TADs.
COSI: VarI COSI
  • Xuanshi Liu, Beijing Children Hospital, China
  • Wei Li, Beijing Children Hospital, China

Short Abstract: Genome Wide Association Study (GWAS) is a widely adopted approach to associate Single Nucleotide Polymorphisms (SNP) with diseases/traits. Majority of these GWAS related SNPs are located in the noncoding regions which makes it difficult to understand their biological implications. Recent studies showed noncoding SNPs detected through GWAS could be enriched in enhancers and may cause diseases by changing gene expression in a long distance. We hypothesis that noncoding GWAS SNPs in the enhancer regions could interact with genes within Topological Associated Domain (TAD). If a deletion (DEL) occurs within a TAD boundary, gene expression can be affected thus playing a role in disease etiology. To test this hypothesis, we generated an integrated map of GWAS SNPs, enhancers, TADs, DELs, and scored possible GWAS-SNP-TAD-DEL pairs. This study highlighted 201,132 high confident pairs of GWAS-SNPs and target genes. Additionally, significant enrichments were observed in several developmental related processes, morphogenesis and leukemia in line with previous long-range studies. Overall, our work has provided a tool for decoding GWAS-SNPs by the genome-wide DEL-interrupted TADs and cataloged high confident pairs which could facilitate the understanding of their potential biological effects.

A haplotype aware rule-based method for evaluating deleteriousness of variants present in an individual exome
COSI: VarI COSI
  • Sutapa Datta, Tata Consultancy Services, India, India
  • Vinay Lanke, Tata Consultancy Services, India, India
  • Rajgopal Srinivasan, Tata Consultancy Services, India, India

Short Abstract: Rapid and continuous progress in next generation sequencing technology has identified several variants and genes causing many complex diseases. In this study, a rule-based method is proposed to predict the risk variants present in an individual exome. Each allele is scored by applying several rules to the annotations associated with them and utilizing the haplotype information and interrelationship between the alleles. The score for a gene is reckoned as the sum of the variant allele scores. Separate sets of rules are applied for different types of Single Nucleotide Variants (such as nonsynonymous, synonymous, start gain, start loss, stop gain and stop loss), and Indels (e.g. frameshift and non-frameshift). To decide if an allele is deleterious, a threshold value is calculated by scoring pathogenic and benign variants present in the Clinvar database. Applying the proposed method to variants in 78 genes associated with metabolic disorders, 96.9% of all 1000 genome samples are classified as healthy. Extensive testing based on application of the method to CAGI ENIGMA dataset, variants from the LOVD database and to exomes of healthy and diseased individuals demonstrates the efficacy of the proposed method.

APOEε4 is associated with chronic traumatic encephalopathy (CTE)
COSI: VarI COSI
  • Kathryn L. Lunetta, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
  • Gary Benson, Boston University, United States
  • Jesse Mez, BU CTE Ctr./BU AD Ctr./Department of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Jaeyoon Chung, Section of Biomedical Genetics, Department of Medicine, Boston University School of Medicine, Boston, MA, United States
  • Mohammed Muzamil Khan, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Kathryn Atherton, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Conor Shea, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Ann C. McKee, CTE Ctr/AD Ctr/Depts. of Path. & Lab. Med./Neuro, BU Med. Sch; VA Boston Health. Sys, Boston; VA Med. Ctr, Bedford, MA, United States
  • John F. Crary, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Lindsay A. Farrer, Sec. of Biomed. Genetics, Dept. of Med./BU AD Ctr./Dept. of Neuro., BU Med. Sch.; Dept. of Biostat., BU SPH, Boston, MA, United States
  • Thor D. Stein, BU CTE Ctr./BU AD Ctr./Dept. Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Yorghos Tripodis, BU CTE Ctr./BU AD Ctr., BU School of Medicine; Dept. of Biostatistics, BU School of Public Health, Boston, MA, United States
  • Evan Nair, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Michael L. Alosco, BU CTE Ctr./BU AD Ctr./Dept. of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Bertrand R. Huber, BU CTE Ctr./AD Ctr./Dept. of Neurology, BU Sch. of Med.; VA Boston Health. Sys., Boston; VA Medical Ctr., Bedford, MA, United States
  • Victor E. Alvarez, BU CTE Ctr., BU Sch. of Med./Data Coordinating Ctr., BU SPH; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Jonathan Cherry, BU CTE Ctr./AD Ctr./Dept. of Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Kurt Farrell, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Joseph N. Palmisano, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Brett M. Martin, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Madeline Uretsky, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Bobak Abdolmohammadi, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Zachary H. Baucom, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States

Short Abstract: Chronic traumatic encephalopathy (CTE) is a neurodegenerative disorder associated with exposure to repetitive head impacts (RHI). It is characterized by the deposition of hyperphosphorylated tau protein (p-tau) throughout the brain. However, the occurrence and severity of CTE varies widely among those with similar RHI exposure, insinuating that other risk factors, including genetics, may have a role. The Apolipoprotein E (APOE) ε4 allele is a major risk factor for Alzheimer’s disease. Limited data suggest it also may incur risk for CTE. Here we used linear and ordinal logistic regression models to test whether APOEε4 dosage is associated with the presence and severity of CTE and related endophenotypes among 295 autopsy-confirmed cases and 240 controls with known RHI exposure. All models were adjusted for age and race. APOEε4 dosage was significantly associated with CTE stage (p = 0.006), presence of dementia (p = 0.014), and both quantitative and semi-quantitative measures of total p-tau burden across the brain (p = 0.024 and 0.001, respectively). Associations were strongest for p-tau burden in the frontal cortex (p = 6e-7), one of the first regions affected by CTE. This study provides the most concrete evidence to date that APOEε4 is a risk factor for CTE.

BubbleGun: Enumerating Bubbles and Superbubbles in Genome Graphs
COSI: VarI COSI
  • Fawaz Dabbaghie, Helmholtz Institute for Pharmaceutical Research Saarland, Germany
  • Tobias Marschall, Institute for Medical Biometry and Bioinformatics, University Hospital, Heinrich Heine University, Düsseldorf, Germany

Short Abstract: With the fast development of third-generation sequencing machines, de novo genome assembly is becoming a routine even for larger genomes. Graph-based representations of genomes arise both as part of the assembly process, but also in the context of pangenomes representing a population. In both cases, polymorphic loci lead to bubble structures in such graphs. Detecting bubbles is hence an important task when working with genomic variants in the context of genome graphs.
Here, we present a fast general-purpose tool, called BubbleGun, for detecting bubbles and superbubbles in genome graphs. Furthermore, BubbleGun detects and outputs runs of linearly connected bubbles and superbubbles, which we call bubble chains. We showcase the utility of our tool on de Bruijn graphs built from human short read sequencing data, where it reports around 2 million bubbles and superbubbles in less than 30 minutes. Moreover, we used BubbleGun on a pangenome from 10 Myxococcus xanthus strains, detecting all bubble chains which covered the entire graph in less than 20 seconds.

Characterisation of CYP2D6 Pharmacogenomic Variation in African Populations: An Integrative Bioinformatics Approach
COSI: VarI COSI
  • David Twesigomwe, University of the Witwatersrand, Johannesburg, South Africa
  • Jorge da Rocha, University of the Witwatersrand, Johannesburg, South Africa
  • Britt Drögemöller, University of Manitoba, Winnipeg, Canada
  • Galen Wright, University of Manitoba, Winnipeg, Canada
  • Zane Lombard, University of the Witwatersrand, Johannesburg, South Africa
  • Scott Hazelhurst, University of the Witwatersrand, Johannesburg, South Africa

Short Abstract: BACKGROUND:
CYP2D6 genetic variation contributes markedly to inter-individual differences in response to medications as the enzyme metabolises 25% of commonly prescribed medications. CYP2D6 phenotype categories include poor, intermediate, normal, and ultra-rapid metabolisers. However, accurately genotyping CYP2D6 is quite challenging due to copy number variations in the gene locus and complex hybrid rearrangements with the neighbouring pseudogene, CYP2D7.

Our study aimed to identify CYP2D6 star alleles from high coverage short-read African whole genome sequence data using specialised bioinformatics algorithms, and estimate the distribution of CYP2D6 metaboliser phenotypes across Africa.

DESCRIPTION:
We benchmarked three state-of-the-art CYP2D6 star allele calling algorithms using 75 publicly available reference datasets. We then used a consensus genotyping approach involving all three tools to call CYP2D6 star alleles from 458 high coverage African whole genome sequence datasets generated by the Human Heredity and Health in Africa Consortium, and the Simon’s Genome Diversity Project. Predefined activity scores were used for phenotype prediction.

CONCLUSION:
This study elucidates the distribution of CYP2D6 star alleles and predicted phenotypes in African populations, which could inform future precision medicine strategies. Additionally, our benchmark of the genotyping tools could be useful to other researchers interested in genotyping CYP2D6 or other highly polymorphic pharmacogenes.

Computational Prediction of Neoantigens: Implemented as Installable GALAXY Workflow.
COSI: VarI COSI
  • Ambarish Kumar, Jawaharlal Nehru University, New Delhi, India, India
  • Ray Sajulga, Center for International Blood and Marrow Transplant Research, USA, United States
  • James Johnson, Minnesota Supercomputing Institute, University of Minnesota, USA, United States
  • Björn Andreas Grüning, Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany

Short Abstract: Existence of endemic viruses and their likely zoonotic transmission bells an alarm for development of population specific vaccines. Study is performed over the public GALAXY server - GALAXY Europe - usegalaxy.eu/ to predict neoantigens. Following GALAXY workflows are formed, made accessible and incorporated into current study.
1. Genomic variant detection
1.1 usegalaxy.eu/u/ambarishk/w/gatk4
1.2 usegalaxy.eu/u/ambarishk/w/varscan
2. HLA typing and neoantigen prediction
2.1 usegalaxy.eu/u/ambarishk/w/neo-antigen-prediction
Schematic execution of the GALAXY workflow is as follows.
Alignment → Genomic variant discovery → Coding DNA mutation to mutual peptide sequences → HLA typing → Neoantigen prediction.
Preliminary result is specific to the SARS-CoV-2 infected population of Wuhan, China. It is accessible as shared GALAXY history - usegalaxy.eu/u/ambarishk/h/neo-antigen-prediction.
Extended work will include development of dedicated GALAXY server for neoantigen prediction and antigenicity analysis as well as generation of results for all virus infections whose vaccines are under development and diseases requiring personalised neoantigen therapy. It will be an effective computational platform for population specific and personalised neoantigen therapy. As a collaborative and shared effort we will be working along with Galaxy for proteomics (Galaxy-P) team at the University of Minnesota. Associated aspects with computational prediction of neoantigens over GALAXY are automation, reproducibility, time and cost-effectiveness.

ConsensuSV: consensus structural-variant caller for next generation sequencing for selected families from 1000 Genomes Project
COSI: VarI COSI
  • Mateusz Chiliński, Centre of New Technologies, University of Warsaw, Poland
  • Agnieszka Kraft, Centre of New Technologies, University of Warsaw, Poland
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, Poland

Short Abstract: The talk presents the ongoing research results on the connection between 3D structure of human genome and its functional effects on the development of clinical phenotypes. The biological samples being analysed origin first from 1000 Genomes Project (three healthy families, parents with daughters), secondly two Polish families where the child developed diabetes type 1, and thirdly single family from Japan, where the child developed leukaemia.

We use 12 gold-standard callers for obtaining accurate Structural Variants (SV) for each family. The results of the tools are merged using our novel algorithm ConsensuSV, which integrates the SV sets using machine learning by combining decision trees and neural networks trained and benchmarked on the high quality SVs from 1000 Genomes Project. Such approach allows us to create the sets of high-confidence Structural Variants for each analysed Trios.

Further, we applied ConsensuSV to families of known phenotype. The resulting list of SVs are used for the identification of genes which expression is altered due to the changes in spatial chromatin organisation. Finally, the functional impact of observed SVs is validated by analysing the list of biological processes that involve those genes.

Development of a tool to annotate mutated genes by integrating STRING and KEGG pathway
COSI: VarI COSI
  • Eri Hayashi, Tokyo Denki University, Japan
  • Yuto Kimura, Tokyo Denki University, Japan
  • Shuichi Hirose, NAGASE & CO., LTD., Japan
  • Wataru Nemoto, Tokyo Denki University, Japan

Short Abstract: N-methyl-N’-nitro-N-nitrosoguanidine (NTG)-induced mutagenesis has been widely performed to induce mutations in microorganisms. We iterated the experiment in Streptomyces lividans1326 to produce PLA2 effectively compared to the wild strain. A total of 374 non-synonymous single nucleotide polymorphisms (SNPs) were identified by comparing between the genome of the wild strain and those of ten mutation strains. In order to elucidate the metabolic mechanism of the effective PLA2 production, we tried to identify the KEGG metabolic pathways with mutated gene clusters with statistical significance. However, we could not find the genes or pathways responsible for the effective PLA2 production. One of the reasons of the failure is that only 23% of the mutated genes are assigned to any genes in one of the metabolic pathways. Hence, we have extended the metabolic pathways by integrating STRING and KEGG pathways. 98.2% of the mutated genes were assigned to any genes in at least one of the extended metabolic pathways. As a result, functions of these genes have been successfully estimated. In addition, we have found that mutated genes were clustered into several metabolic pathways. In this poster session, we will discuss the genes and pathways responsible for the effective PLA2 production.

eDGAR+: a data resource of annotated gene-variant-disease relations
COSI: VarI COSI
  • Giulia Babbi, Biocomputing Group Bologna, Italy
  • Pier Luigi Martelli, University of Bologna, Italy
  • Castrense Savojardo, University of Bologna, Italy
  • Davide Baldazzi, University of Bologna, Italy
  • Teresa Tavella, University of Bologna, Italy
  • Rita Casadio, University of Bologna, Italy

Short Abstract: Next-generation sequencing techniques provide a huge amount of genetic mutations that need to be annotated. To comprehensively annotate genes and variants we need to consider the different layers of information describing the biological complexity, merging data from many resources, especially when we want to investigate the molecular mechanisms characterizing genes and variants associated with diseases.
Here we present the new version of eDGAR, eDGAR+, a webserver of gene/variant-disease associations that retrieves comprehensive annotation for genes and variants associated with diseases. We collected more than 17,200 curated associations from UniProtKB, ClinVar, OMIM and DisGeNET. The main novelties of the new version of eDGAR are: i) the inclusion of variant-disease associations and their known effects on protein products; ii) the new annotation pipeline that includes information on tissue of expression, subcellular localization and possibly related drugs; iii), the updated enrichment analysis performed with NETGE-PLUS algorithm for the annotation of GO, KEGG and REACTOME terms characterizing sets of genes related to the same disease. eDGAR+ allows retrieving all the features shared by a user-defined set of genes or variants. Researchers may take advantage of eDGAR+, retrieving the curated annotation of genes/variants of interest, analysing shared features and molecular mechanisms to direct further experiments.

Enhanced Splicing Annotation And RNA-Seq From Clinically Accessible Tissues Improves Outlier Prediction For Non-accessible Tissues
COSI: VarI COSI
  • Muhammed Hasan Çelik, Technical University of Munich, Germany
  • Nils Wagner, Technical University of Munich, Germany
  • Julien Gagneur, Technical University of Munich, Germany

Short Abstract: Aberrant splicing is a major cause of genetic diseases. However, the affected tissues of a large set of genetic disorders, including cardiac and neurological disorders are not clinically accessible, preventing experimental detection of aberrant splicing. Here we develop the first benchmark datasets and algorithms for predicting tissue-specific aberrant splicing. We focus on the task of prioritizing rare genetic variants. Applying MMSplice, a state-of-the-art model predicting percent-spliced-in based on DNA sequence, to existing exon annotations show limited performance for outlier prediction. A substantial improvement is obtained by combining MMSplice with a tissue-specific map of splice site and splicing fractions (Percent Spliced-In) we generated. Finally, a model which further integrates splicing measurements from whole blood RNA-seq reaches a median of AU-PRC 12%, i.e. about 15-fold improvement over MMSplice alone. Altogether, our approach and results have implications for non-invasive genetic diagnostics including in neonatal settings.

EpiGEN: an epistasis simulation pipeline
COSI: VarI COSI
  • Lorenzo Viola, Technical University of Munich, Germany
  • Paolo Tieri, CNR National Research Council, Italy
  • Jan Baumbach, Technical University of Munich, Germany
  • David B. Blumenthal, Technical University of Munich, Germany
  • Tim Kacprowski, Technical University Munich, Germany
  • Markus List, Technical University of Munich,, Germany

Short Abstract: The goal of genome-wide association studies (GWAS) is to link genetic variants to phenotypic traits of interest. Individually, single nucleotide polymorphisms (SNPs) only account for a fraction of the heritability of the investigated traits. Interactions between several SNPs that are jointly predictive of the phenotype but individually have little or no effect are thus sought out by epistasis detection tools. Since epistasis is hard to detect and confirmed true epistasis is still scarce, the evaluation of such tools crucially depends on simulation data. Yet, existing simulators do not account for linkage disequilibrium (LD), support only limited interaction models and dichotomous phenotypes, or rely on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns, and can generate both categorical and quantitative phenotypes. It is implemented in Python 3 and freely available at github.com/baumbachlab/epigen, is well documented and ships with pre-computed genotype corpora to simplify its usage. EpiGEN is the only tool offering the flexibility to simulate epistasis data under arbitrarily complex interaction models and phenotypes and is thus ideally suited as a first step in benchmarking epistasis modeling and detection tools.

Genome wide association study of chronic traumatic encephalopathy
COSI: VarI COSI
  • Kathryn L. Lunetta, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
  • Gary Benson, Boston University, United States
  • Jesse Mez, BU CTE Ctr./BU AD Ctr./Department of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Jaeyoon Chung, Section of Biomedical Genetics, Department of Medicine, Boston University School of Medicine, Boston, MA, United States
  • Mohammed Muzamil Khan, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Kathryn Atherton, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Conor Shea, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Ann C. McKee, CTE Ctr/AD Ctr/Depts. of Path. & Lab. Med./Neuro, BU Med. Sch; VA Boston Health. Sys, Boston; VA Med. Ctr, Bedford, MA, United States
  • John F. Crary, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Lindsay A. Farrer, Sec. of Biomed. Genetics, Dept. of Med./BU AD Ctr./Dept. of Neuro., BU Med. Sch.; Dept. of Biostat., BU SPH, Boston, MA, United States
  • Thor D. Stein, BU CTE Ctr./BU AD Ctr./Dept. Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Yorghos Tripodis, BU CTE Ctr./BU AD Ctr., BU School of Medicine; Dept. of Biostatistics, BU School of Public Health, Boston, MA, United States
  • Evan Nair, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Michael L. Alosco, BU CTE Ctr./BU AD Ctr./Dept. of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Bertrand R. Huber, BU CTE Ctr./AD Ctr./Dept. of Neurology, BU Sch. of Med.; VA Boston Health. Sys., Boston; VA Medical Ctr., Bedford, MA, United States
  • Victor E. Alvarez, BU CTE Ctr., BU Sch. of Med./Data Coordinating Ctr., BU SPH; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Jonathan Cherry, BU CTE Ctr./AD Ctr./Dept. of Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Kurt Farrell, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Joseph N. Palmisano, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Brett M. Martin, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Madeline Uretsky, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Bobak Abdolmohammadi, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Zachary H. Baucom, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States

Short Abstract: Chronic traumatic encephalopathy (CTE) is a neurodegenerative disease associated with repetitive head impact (RHI) exposure. CTE pathology presence and severity varies among those with similar RHI exposure, suggesting a role for other factors, including genetics. We conducted the first neuropathologically-confirmed CTE genome-wide association study. 237 brain donors from the Veterans Affairs-Boston University-Concussion Legacy Foundation Brain Bank with RHI exposure from contact sports and/or military service were assessed for CTE. We modeled the association of genome-wide genotyped and imputed single nucleotide polymorphisms (SNPs) with CTE and related traits in regressions adjusted for age, sex, and 10 principal components of population substructure; additional models also adjusted for years of RHI and/or stratified by race. Three loci achieved genome-wide significance, including rs10060258 [minor allele frequency (MAF)=0.38; p=4.8x10-9; upstream of CDH9; chromosome 5], rs687269 (MAF=0.11; p=4.2x10-9; downstream of AFDN; chromosome 6) and rs2828511 (MAF=0.48; p=4.0x10-8; downstream of VN2R20P; chromosome 21). CDH9 is associated with risk-taking behavior, a CTE symptom. AFDN is linked with a blood-brain barrier and immune cell transmigration pathway. VN2R20P is associated with Alzheimer’s Disease, which has similar symptoms and neuropathology as CTE. Identified loci may implicate disease mechanisms, provide therapy targets, and guide athlete counseling regarding risk of play.

Great increase in rat genome variants collection from Hybrid Rat Diversity Panel
COSI: VarI COSI
  • Mahima Vedi, Medical College of Wisconsin, United States
  • Anne E. Kwitek, Medical College of Wisconsin, United States
  • Melinda Dwinell, Medical College of Wisconsin, United States
  • Jeff De Pons, Medical College of Wisconsin, United States
  • Marek Tutaj, Medical College of Wisconsin, United States
  • Jyothi Thota, Medical College of Wisconsin, United States
  • Harika Srividya Nalabolu, Medical College of Wisconsin, United States
  • Logan Lamers, Medical College of Wisconsin, United States
  • Monika Tutaj, Medical College of Wisconsin, United States
  • Shirng-Wern Tsaih, Medical College of Wisconsin, United States
  • Cody Plasterer, Medical College of Wisconsin, United States
  • Mary Kaldunski, Medical College of Wisconsin, United States
  • Matthew Hoffman, Medical College of Wisconsin, United States
  • Stan Laulederkind, Medical College of Wisconsin, United States
  • Shur-Jen Wang, Medical College of Wisconsin, United States
  • G. Thomas Hayman, Medical College of Wisconsin, United States
  • Jennifer R. Smith, Rat Genome Database, Medical College of Wisconsin, United States

Short Abstract: Advances in sequencing technologies lead to increases in the amount and quality of generated data, which further serve to increase the quality of genome assemblies and annotations. Together with continuous improvements in algorithms to identify genomic variants, those advances create greater demands to reexamine the archived data. Rat Genome Database (RGD) has reanalyzed and validated strain-specific variants using the most recent version of the rat reference genome (rn6.0) that has a major increase in the number of predicted transcripts and genomic features compared to previous versions (3.4 and 5.0). The existing genomic variants come from 40 inbred rat strains used as models for hypertension, renal disease, insulin resistance, metabolic disorders, autoimmunity and cancer. The Hybrid Rat Diversity Panel (HRDP) is a combination of 96 physiologically studied inbred rat strains that are a foundation for powerful complex trait mapping and correlation analysis. Furthermore, their genomes are currently being sequenced. The strains included in HRDP panel will substantially increase variation in genome sequences and phenotypic profiles for other diseases like asthma, alcohol preference, anxiety, eye disorders, seizures and stroke. Genome variants together with phenotype measurements will be available on Hybrid Rat Diversity Panel Portal created on RGD website.

Insights into disease and benign missense variants in transmembrane proteins.
COSI: VarI COSI
  • James Baker, EMBL-EBI, United Kingdom
  • Antonio Ribeiro, EMBL-EBI, United Kingdom
  • Marcia Hasenahuer, EMBL-EBI, United Kingdom
  • James Stephenson, EMBL-EBI, United Kingdom
  • Roman Laskowski, EMBL-EBI, United Kingdom
  • Janet Thornton, EMBL-EBI, United Kingdom
  • I Sillitoe, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, United Kingdom

Short Abstract: Interpreting the mechanisms by which missense variants cause disease remains challenging. In this study, we focus on the missense variants of alpha-helical transmembrane proteins. Membrane-bound proteins underpin almost every biological process directly, or indirectly, from photosynthesis to respiration. The transmembrane helical regions are compositionally distinct from other non-transmembrane regions and are located in a lipid environment rather than a watery environment. It stands to reason that, as others have shown, variants in these membrane regions will have profoundly different biochemical impacts than a similar residue change in their soluble counterparts. Herein we use variants from ClinVar, Humsavar, and gnomAD, and cross-reference those against several TMH boundary resources along with structural and evolutionary information. Our findings show significant differences in disease propensities of single-pass and multipass TMHs as well as more nuanced features. We reveal novel insights into variant effects in transmembrane proteins using structurally defined pore-lining residues. We present evidence that protein families have different variant effect “rules”. We demonstrate that the functional, evolutionary, and biophysical context of a variant in the protein can improve our understanding of disease variants which is essential in order to develop therapeutic approaches to counter such deleterious effects.

Is the Bombali virus pathogenic in humans?
COSI: VarI COSI
  • Stuart Masterson, University of Kent, United Kingdom

Short Abstract: The 2013–2016 West African Ebolavirus outbreak was responsible for the death of over 11,000 people. Using sequencing data and detailed structural analysis, we compare the newly discovered Bombali ebolavirus to Reston ebolavirus, the only species not pathogenic in humans, and the other four pathogenic Ebolavirus species, to determine whether Bombali virus is pathogenic. Here we present an updated analysis of original work using over 1,400 full length Ebolavirus genomes from all six species, an eight fold increase on the original 196. We use specificity determining positions (SDPs), residues that are differentially conserved between two groups, to identify molecular determinants of pathogenicity. The number of SDPs reduces to 165 from 180 originally, though with a significant (73%) overlap between the two datasets, demonstrating the accuracy of our approach. We propose VP24 is the primary protein involved in pathogenicity, with residues located in the VP24/Karyopherin-alpha binding site highlighted as key determinants in difference seen between Reston virus and the pathogenic species. For two specific binding site SDPs (M136L, R139S) that have been suggested as critical for the lack of Reston virus human pathogenicity, Bombali virus amino acids match those of Reston virus, suggesting Bombali virus may not be pathogenic in humans.

Machine Learning for improved prioritization of Variants of Uncertain Significance
COSI: VarI COSI
  • Daniel Mahecha, Universidad de los Andes, Colombia
  • Haydemar Nunez, Universidad de los Andes, Colombia
  • Maria Claudia Lattig, Universidad de los Andes, Colombia
  • Jorge Duitama, Universidad de los Andes, Colombia

Short Abstract: Diagnosis of genetic diseases from high throughput DNA sequencing data is becoming a common practice. An important concern among practitioners interpreting genetic diagnostic reports is the significant number of disease-related variants classified as Variants of Uncertain Significance (VUS). Due to sampling biases in public databases, non-caucasian patients show a greater proportion of these inconclusive variants. Here, we present the application of different machine-learning methods to test whether the variant classification process can be improved to prioritize which VUS are suitable candidates for functional studies. We trained and compared a Naive Bayes model, a Random Forest (RF), a Support Vector Machine, and a Five-Layer Perceptron (MLP) using variants from ClinVar labeled as VUS on october 2017, but reclassified as pathogenic, likely pathogenic, likely benign and benign on october 2019. A set of conservation scores and 1,000 human genomes global allele frequencies were used as features for model training. The RF and the MLP models showed the highest accuracy, above commonly used tools for variant deleteriousness prediction, on a set of missense VUS. We believe that the models trained in this work can be integrated in analysis pipelines to improve accuracy of genetic diagnosis.

Missense3D- How effectively can one model the structural impact of a missense variant in both experimental and predicted globular and transmembrane protein structures?
COSI: VarI COSI
  • Alessia David, Imperial College London, United Kingdom
  • Tarun Khanna, Imperial College London, United Kingdom
  • Sirawit Ittisoponpisan, Imperial College London, United Kingdom
  • Suhail A Islam, Imperial College London, United Kingdom
  • Eman Alhuzimi, Imperial College London, United Kingdom
  • Michael Sternberg, Imperial College London, United Kingdom

Short Abstract: A missense variant can disrupt protein stability with consequential disease association. However only ~17% of residues in the human proteome are within an experimental PDB structure whilst template-based prediction can cover an additional ~35%. It is important that any structure-based evaluation of missense variants can be applied to, and has been benchmarked on, both experimental and template-modelled structures. This motivated our development of Missense3D. The SQWRL4 algorithm remodels the local environment of the variant side-chain keeping the main-chain fixed. We mapped Humsavar, ClinVar and ExAC variants with their neutral or disease classifications onto 606 protein structures. 40% of the 1,965 variants associated with disease are identified by Missense3D as having a destabilising impact (i.e. TPR) whilst only 11% of the neutral had an impact (FPR). Similar results are reported for predicted structures. Missense3D can be used via a web server. Missense3D has analysed over 10 million missense variants from UniProt, ClinVar and gnomAD and the results are available via the Missense3D web page. We have extended Missense3D to analyse membrane-spanning proteins, both experimental and predicted, and overall the TPR and FPR are 42% and 15% with no significant (5%) difference between the results for experimental and predicted transmembrane structures.

Pleiotropy analysis of chronic traumatic encephalopathy with other tauopathies including progressive supranuclear palsy and Alzheimer’s disease
COSI: VarI COSI
  • Kathryn Lunetta, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
  • Gary Benson, Boston University, United States
  • Jesse Mez, BU CTE Ctr./BU AD Ctr./Department of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Jaeyoon Chung, Section of Biomedical Genetics, Department of Medicine, Boston University School of Medicine, Boston, MA, United States
  • Mohammed Muzamil Khan, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Kathryn Atherton, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Conor Shea, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Ann McKee, CTE Ctr/AD Ctr/Depts. of Path. & Lab. Med./Neuro, BU Med. Sch; VA Boston Health. Sys, Boston; VA Med. Ctr, Bedford, MA, United States
  • John Crary, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Lindsay Farrer, Sec. of Biomed. Genetics, Dept. of Med./BU AD Ctr./Dept. of Neuro., BU Med. Sch.; Dept. of Biostat., BU SPH, Boston, MA, United States
  • Thor Stein, BU CTE Ctr./BU AD Ctr./Dept. Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Yorghos Tripodis, BU CTE Ctr./BU AD Ctr., BU School of Medicine; Dept. of Biostatistics, BU School of Public Health, Boston, MA, United States
  • Evan Nair, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Michael Alosco, BU CTE Ctr./BU AD Ctr./Dept. of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Bertrand Huber, BU CTE Ctr./AD Ctr./Dept. of Neurology, BU Sch. of Med.; VA Boston Health. Sys., Boston; VA Medical Ctr., Bedford, MA, United States
  • Victor Alvarez, BU CTE Ctr., BU Sch. of Med./Data Coordinating Ctr., BU SPH; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Jonathan Cherry, U CTE Ctr./AD Ctr./Dept. of Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Kurt Farrell, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Joseph Palmisano, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Brett Martin, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Madeline Uretsky, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Bobak Abdolmohammadi, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Zachary Baucom, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States

Short Abstract: Chronic traumatic encephalopathy (CTE) is a neurodegenerative disease associated with repetitive head impact exposure. It is characterized by hyperphosphorylated tau aggregation, which is a common pathological finding in other neurodegenerative diseases including progressive supranuclear palsy (PSP) and late onset Alzheimer’s disease (AD). In this study, we conducted pleiotropy analyses using a genome-wide association study (GWAS) of CTE stage (144 cases, 93 controls) and summary statistics from two GWASs of PSP (114 cases, 3247 controls) and AD (21,982 cases, 41,944 controls). We used a conditional-false discovery rate (cFDR) method with a cFDR threshold of 0.01. We identified a single nucleotide polymorphism (SNP) near FA2H from the cFDR analysis with CTE and PSP (rs7190905; minor allele frequency [MAF]=0.49; PCTE =6.6x10-3, PPSP=1.0x10-4, cFDR= 6.6x10-3). For CTE and AD, we observed one pleiotropy SNP near PDE6H (rs10846103; MAF=0.03; PCTE = 1.5x10-5; PAD= 1.9x10-3; cFDR= 0.005). Homozygous mutations in FA2H lead to a childhood autosomal recessive leukodystrophy. FA2H is also associated with intelligence and MRI white matter lesion progression in GWAS. PDE6H is associated with rare eye disorders, specifically retinal cone dystrophy type 3A. Our findings may help to advance understanding of common biological mechanisms underlying tau-related diseases.

Population-specific VNTR Alleles in the Human Genome
COSI: VarI COSI
  • Marzieh Eslami Rasekh, Boston University, United States
  • Yozen Hernandez, The Rockefeller University, United States
  • Gary Benson, Boston University, United States

Short Abstract: Variable number of tandem repeats (VNTRs) are polymorphic DNA tandem repeat loci in which the number of pattern copies varies across a population. Human minisatellite VNTR loci (with pattern lengths from seven to hundreds of base pairs) have a variety of functional effects (transcription factor binding, RNA splicing) and are associated with disease (neurodegenerative disorders, cancers, Alzheimer’s disease). Despite their importance, relatively few minisatellite VNTRs have been identified and studied in detail. As part of a large survey of VNTR occurrence in over 2,500 human whole genome sequencing samples from the 1000 genomes project, we sought to identify population-specific VNTR alleles. We found 5,541 “common” VNTR loci (occurring in ≥ 5% of the samples) and used their alleles to develop a decision tree classification model to predict super-population membership, with 97.81% accuracy. We then identified 1,283 top population predictive alleles. Finally, we developed a novel ‘Virtual Gel’ illustration showing how alleles differ across populations at population-specific loci. This is the first large-scale study of population-specific VNTRs and the information obtained could be useful for haplotype inference, studies of human migration and evolution, and accurate use of VNTRs in GWAS studies.

Predicting therapeutic sensitivities from network rewiring of fusion
COSI: VarI COSI
  • Kivilcim Ozturk, University of California San Diego, United States
  • Hannah Carter, UNIVERSITY OF CALIFORNIA SAN DIEGO, United States

Short Abstract: The complexity of biological systems makes it difficult to evaluate the impact of individual mutations at the level of cellular processes. To overcome this, we propose to study the effects of cancer mutations in the context of molecular network architectures. As mutations causing larger disruption to the underlying network can create more vulnerabilities that can be used as drug targets, we focus on fusion events which can interfere with many interactions of their parent proteins. The RUNX1-RUNX1T1 translocation, t(8;21)(q22;q22), in AML is thought to be a pre-leukemic event that can result in tumors after receiving additional mutations. To study the transcriptional effects of RUNX1-RUNX1T1 fusion on the underlying interactome, U937T cells with tetracycline-inducible RUNX1-RUNX1T1 were grown in triplicate and RNA was harvested prior to induction and at certain times post-induction. Wild type U937T cells with no perturbation serve as negative control. RNA sequencing was performed, followed by read alignment to the reference genome. 1,582 genes are found to be differentially expressed in fusion induced cells versus wild type at FDR<0.05, among which 47 of them encode proteins in RUNX1-RUNX1T1 subnetwork. Gene set enrichment analysis using these genes revealed that 30 gene sets are significantly enriched at nominal p-value<0.01.

Prioritizing credible causal risk variants for epithelial ovarian cancer by genomic and epigenomic annotation analyses
COSI: VarI COSI
  • Pei-Chen Peng, Cedars-Sinai Medical Center, United States
  • Alberto Reyes, Cedars-Sinai Medical Center, United States
  • Eileen Darang, University of Cambridge, United Kingdom
  • Felipe Dezem, Cedars-Sinai Medical Center, United States
  • Jonathan Tyrer, University of Cambridge, United Kingdom
  • Rosario Corona, Cedars-Sinai Medical Center, United States
  • Brian Davis, Cedars-Sinai Medical Center, United States
  • Stephanie Chen, Cedars-Sinai Medical Center, United States
  • Ji-Heui Seo, Dana-Farber Cancer Institute, United States
  • Ovarian Cancer Association Consortium, Ovarian Cancer Association Consortium, United States
  • Kate Lawrenson, Cedars-Sinai Medical Center, United States
  • Jasmine Plummer, Cedars-Sinai Medical Center, United States
  • Matthew Freedman, Dana-Farber Cancer Institute, United States
  • Paul Pharoah, University of Cambridge, United Kingdom
  • Simon Gayther, Cedars-Sinai Medical Center, United States
  • Michelle Jones, Cedars-Sinai Medical Center, United States

Short Abstract: Genome wide association studies (GWAS) have identified 44 confirmed genomic regions associated with risk of epithelial ovarian cancer (EOC). Over time, we have accumulated genomic and epigenomic profiling datasets for EOC-related tissues, enabling us to prioritize the function of disease risk-associated single nucleotide polymorphisms (SNPs) to infer genotype-phenotype relationships. Here, we studied GWAS data for 26,151 EOC cases and 105,724 controls imputed to the Haplotype Reference Consortium reference panel. We first estimated the heritability explained by known common SNPs for EOC. The narrow sense heritability for EOC overall was estimated to be ~5%. We further partitioned SNP-heritability across broad functional categories and EOC related epigenomic annotations. A ChromHMM model was trained to characterize the chromatin states in 18 EOC cell-lines and precursor cell types using H3K27ac, H3K4me1, H3K4me3 histone marks, and RNA-seq. We identified significant enrichment of risk SNPs in histotype-matched active promoters, active enhancers, and 3’UTR. Lastly, the significant annotations were integrated with GWAS summary statistics to estimate the probability of causality for all SNPs from the risk loci on the densely imputed dataset, by trained histotype-specific PAINTOR models. Overall, we demonstrated that functional annotations can guide and prioritize EOC risk variants.

Reconciling Diverse SNP-Gene Association Methods to Delineate the Functional Basis of Complex Traits and Diseases
COSI: VarI COSI
  • Alexander McKim, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States

Short Abstract: Genome-wide association study (GWAS) is a powerful approach for identifying single-nucleotide polymorphisms (SNPs) across the genome that are associated with complex polygenic traits like ‘blood pressure’ and ‘language development’. Multiple classes of methods have been developed to relate the identified SNPs to genes, and subsequently cellular processes, that constitute the mechanistic basis of the trait/disease. Yet, it is not clear how SNP-gene associations based on entirely different modalities – chromosomal proximity, short-range gene regulation (via expression quantitative trait loci), and long-range interactions (via 3D contact) – compare and relate to each other. We have developed a scalable computational approach to analyze millions of GWAS variants from publicly available databanks and systematically compare these modalities across hundreds of traits/diseases at the level of genes and biological processes. This analysis points to robust molecular features for each trait/disease captured by multiple modalities as well as features common to multiple traits/diseases. Together, our approach will help chart mechanism-guided paths from genetic variants to diagnostic and treatment options.

Seak marries regulatory genomics deep learning with rare-variant association tests
COSI: VarI COSI
  • Remo Monti, Hasso Plattner Institute, Max Delbrück Center for Molecular Medicine, Germany
  • Pia Rautenstrauch, Hasso Plattner Institute, University of Tübingen, Germany
  • Stefan Konigorski, Hasso Plattner Institute, Germany
  • Alva Rani James, Hasso Plattner Institute, Germany
  • Mahsa Ghanbari, Max Delbrück Center for Molecular Medicine, Germany
  • Uwe Ohler, Max Delbrück Center for Molecular Medicine, Germany
  • Christoph Lippert, Hasso Plattner Institute, Germany

Short Abstract: Sequencing-based genotyping methods are on the rise, yet leveraging the predominantly rare genetic variants they measure remains challenging. Large rare-variant association studies have mainly focused on protein-altering variation while little attention has been given to variants acting on the RNA level or other non-coding regulatory mechanisms. For these mechanisms, deep learning has recently been successful at predicting the effects of genetic variants.

Here we introduce seak (sequence annotations in kernel-based tests), a Python package that flexibly integrates variant effect predictions into set-based association tests while controlling for relatedness and population structure using linear mixed models. We first show that using functional variant effect predictions can increase statistical power in simulation studies and shed light on potentially causal mechanisms. Then we apply seak to the UK Biobank exome-sequencing dataset. We perform association tests for three biomarkers of cardiovascular disease and cancer, incorporating deep-learning-derived variant effects for disease-related RNA-binding proteins. With this novel approach we find two significant associations for each biomarker, which include both novel and known associations.

Our results demonstrate that, by incorporating regulatory variant effects, seak can identify novel biologically interpretable associations, thereby unlocking the potential of whole-exome and whole-genome sequencing studies.

The Annotation Query (AnnoQ): An Integrated Functional Annotation Platform for Large Scale Genetic Variant Annotation
COSI: VarI COSI
  • Zhu Liu, Keck School of Medicine at University of Southern California, United States
  • Tremayne Mushayahama, Keck School of Medicine at University of Southern California, United States
  • Huaiyu Mi, Keck School of Medicine at University of Southern California, United States

Short Abstract: The goal of AnnoQ is to provide an easy-to-use user interface for scientists with different bioinformatics skills, including bench scientists and statisticians, to retrieve large-scale genetic variant annotation data. The backend of the system is a large collection of pre-annotated variants from the Haplotype Reference Consortium (~39 million) with over 1000 annotation fields, including sequence features (by WGSA) and functions/pathways (Gene Ontology, PANTHER and Reactome). The data is built in an Elasticsearch framework with an API for users to query variants by multi-criteria, such as a chromosome range, gene/variant IDs, and full-text keyword search. The API supports three different interfaces for users to access the data. First is a web interface through which users can explore the data interactively, annotate a small variant dataset, and generate a configuration file to specify the annotation field to be used in the other two interfaces. Second is through the command-line. The third is through programming scripts, for example, R. An R package is prepared for users to access annotation data while running the R script. In addition, our infrastructure is horizontally scalable. It has the ability to expand up to a billion variants and return results in near real-time.

The H1b MAPT Sub-Haplotype provides a protective effect against CTE progression
COSI: VarI COSI
  • Jonathan Cherry, BU CTE Ctr./AD Ctr./Dept. of Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Gary Benson, Boston University, United States
  • Jesse Mez, BU CTE Ctr./BU AD Ctr./Department of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Ann C. McKee, CTE Ctr/AD Ctr/Depts. of Path. & Lab. Med./Neuro, BU Med. Sch; VA Boston Health. Sys, Boston; VA Med. Ctr, Bedford, MA, United States
  • John F. Crary, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Lindsay A. Farrer, Sec. of Biomed. Genetics, Dept. of Med./BU AD Ctr./Dept. of Neuro., BU Med. Sch.; Dept. of Biostat., BU SPH, Boston, MA, United States
  • Thor D. Stein, BU CTE Ctr./BU AD Ctr./Dept. Path. & Lab. Med., BU Med. Sch.; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Yorghos Tripodis, BU CTE Ctr./BU AD Ctr., BU School of Medicine; Dept. of Biostatistics, BU School of Public Health, Boston, MA, United States
  • Kathryn L. Lunetta, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
  • Michael L. Alosco, BU CTE Ctr./BU AD Ctr./Dept. of Neurology, Boston University School of Medicine, Boston, MA, United States
  • Bertrand R. Huber, BU CTE Ctr./AD Ctr./Dept. of Neurology, BU Sch. of Med.; VA Boston Health. Sys., Boston; VA Medical Ctr., Bedford, MA, United States
  • Victor E. Alvarez, BU CTE Ctr., BU Sch. of Med./Data Coordinating Ctr., BU SPH; VA Boston Health. Sys., Boston; VA Med. Ctr., Bedford, MA, United States
  • Conor Shea, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Kurt Farrell, Dept. of Pathology/Fishberg Dept. of Neuroscience/Ronald M. Loeb Center for Alzheimer’s Disease, ISMMS, New York, NY, United States
  • Joseph N. Palmisano, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Brett M. Martin, Data Coordinating Center, BU School of Public Health/BU Alzheimer's Disease Center, BU School of Medicine, Boston, MA, United States
  • Madeline Uretsky, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Bobak Abdolmohammadi, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Zachary H. Baucom, Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
  • Evan Nair, Boston University Chronic Traumatic Encephalopathy Center, Boston University School of Medicine, Boston, MA, United States
  • Jaeyoon Chung, Section of Biomedical Genetics, Department of Medicine, Boston University School of Medicine, Boston, MA, United States
  • Mohammed Muzamil Khan, Bioinformatics Graduate Program, Boston University, Boston, MA, United States
  • Kathryn Atherton, Bioinformatics Graduate Program, Boston University, Boston, MA, United States

Short Abstract: Chronic traumatic encephalopathy (CTE) is a neurodegenerative tauopathy associated with repetitive head impact (RHI) exposure. CTE presence and severity varies among those with similar RHI exposure, suggesting a role for other factors, including genetics. The MAPT gene, which encodes tau, is associated with other tauopathies, including frontotemporal lobar degeneration. However, MAPT’s role in CTE remains unclear. 208 cases and 41 controls from the Veterans Affairs-Boston University-Concussion Legacy Foundation Brain Bank with RHI exposure from contact sports and/or military service were genotyped for 9 MAPT single nucleotide polymorphisms (SNPs) that define well-characterized sub-haplotypes: rs1467967, rs1800547, rs242557, rs2471738, rs3785883, rs62063857, rs7521, rs8070723, rs9468. We modeled the association of SNPs with CTE, CTE stage, dementia, and tau burden; adjusted for age, self-reported race, and years of RHI exposure. rs7521 and rs1467967, which tag the H1b sub-haplotype, were negatively associated with CTE Stage [rs7521:minor allele frequency (MAF)=0.46, beta=-1.51, p=0.002; rs1467967:MAF=0.35, beta=-1.08, p=0.018]. We observed significant interactions between minor allele dosage and RHI exposure for both SNPs [rs7521:beta=0.08, p=0.013; rs1467967:beta=0.077, p=0.011]. These findings suggest a relationship between genetic and environmental factors in CTE.

The PepVEP variant interpretation service
COSI: VarI COSI
  • Andrew Nightingale, EMBL-EBI, United Kingdom
  • Sarah Hunt, EMBL-EBI, United Kingdom
  • Jie Luo, EMBL-EBI, United Kingdom
  • Mahdi Mahmoudy, EMBL-EBI, United Kingdom
  • Alok Mishra, EMBL-EBI, United Kingdom
  • Sreenath Nair, EMBL-EBI, United Kingdom
  • Sameer Velankar, EMBL-EBI, United Kingdom
  • James Stephenson, EMBL-EBI, United Kingdom
  • Roman Laskowski, EMBL-EBI, United Kingdom
  • Janet Thornton, EMBL-EBI, United Kingdom
  • María Martin, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Short Abstract: Associating variants to a molecular consequence that explains an observed phenotype of the disease is still a bottleneck in clinical genomics. Combining genetic information with expert curated functional annotations from UniProt and PDBe is particularly important for identifying causal variants and interpreting their molecular consequence in a disease. The Protein Variant Effect Predictor (PepVEP) provides the scientific community with a variation interpretation service that links genetic variation and protein function.

PepVEP combines information from the Variant Effect Predictor (VEP) with up-to-date high-quality protein functional annotations and clinical information from UniProt, and protein structure functional annotations from PDBe. The service allows users to explore functional knowledge for submitted variants and consequence predictions from CADD, PolyPhen, SIFT, etc. Get a summary of variant knowledge from a catalogue of variant data sets, such as gnomAD, TCGA and ClinVar, including a dataset’s population frequencies. Variants in a protein coding region of the genome are reported with protein functional annotations like domains and important sites from UniProt and structure functional annotations like ligands and protein-protein interactions from PDBe. Users can compare submitted novel/private variants with reported co-located variants and use the known variant functional annotations to interpret and infer consequences on these variants.

Transfer learning enables prediction of CYP2D6 haplotype function
COSI: VarI COSI
  • Gregory Mcinnes, Stanford University, United States
  • Russ Altman, Stanford University, United States
  • Erica Woodahhl, University of Montana, United States

Short Abstract: Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene whose protein product metabolizes more than 20% of clinically used drugs. Genetic variations in CYP2D6 are responsible for interindividual heterogeneity in drug response that can lead to drug toxicity and ineffective treatment, making CYP2D6 one of the most important pharmacogenes. Prediction of CYP2D6 phenotype relies on curation of literature-derived functional studies to assign a functional status to CYP2D6 haplotypes. As the number of large-scale sequencing efforts grows, new variants and haplotypes continue to be discovered, and assignment of function is challenging to maintain. To address this challenge, we have trained a deep learning model to predict functional status of CYP2D6 haplotypes, called Hubble.2D6. We find that Hubble.2D6 predicts CYP2D6 haplotype functional status with 88% accuracy in a held out test set and explains a significant amount of the variability in in vitro functional data. Hubble.2D6 may be a useful tool for assigning function to haplotypes with uncurated function, which may be used for screening individuals who are at risk of being poor metabolizers.

Using an Integrative Machine Learning Approach Utilising Homology Modelling to Clinically Interpret Genetic Variants: CACNA1F as an Exemplar
COSI: VarI COSI
  • Shalaw Sallah, The University of Manchester, United Kingdom
  • Panagiotis Sergouniotis, The University of Manchester, United Kingdom
  • Stephanie Barton, The University of Manchester, United Kingdom
  • Simon Ramsden, The University of Manchester, United Kingdom
  • Rachel Taylor, The University of Manchester, United Kingdom
  • Amro Safadi, The University of Manchester, United Kingdom
  • Jamie Ellingford, The University of Manchester, United Kingdom
  • Mitra Kabir, The University of Manchester, United Kingdom
  • Nick Lench, Congenica Ltd, United Kingdom
  • Simon Lovell, The University of Manchester, United Kingdom
  • Graeme Black, The University of Manchester, United Kingdom

Short Abstract: Advances in DNA sequencing technologies have revolutionised rare disease diagnostics leading to an increase in the volume of available genomic data. A key challenge that needs to be overcome to realise the full potential of these technologies is to precisely predict the effect of genetic variants on molecular and organismal phenotypes. Notably, despite recent progress, there is still a lack of robust in silico tools that accurately assign clinical significance to variants. Genetic alterations in the CACNA1F gene are the commonest cause of X-linked incomplete Congenital Stationary Night Blindness, a condition associated with non-progressive visual impairment. We combined genetic and homology modelling data to produce CACNA1F-vp, an in silico model that differentiates disease-implicated from benign missense CACNA1F changes. CACNA1F-vp predicts variant effects on the structure of the CACNA1F encoded protein (a calcium channel) using parameters based upon changes in amino acid properties and position. CACNA1F-vp outperformed four other tools in identifying disease-implicated variants (area under curve ROC & PR = 0.84; MCC = 0.52). We consider this protein-specific model to be a robust stand-alone diagnostic classifier that could be replicated in other proteins and could enable precise and timely diagnosis.

Using structural analysis to clinically interpret missense variants: disease associated X-linked genes as an exemplar
COSI: VarI COSI
  • Shalaw Sallah, The University of Manchester, United Kingdom
  • Nick Lench, Congenica Ltd, United Kingdom
  • Simon Lovell, The University of Manchester, United Kingdom
  • Graeme Black, The University of Manchester, United Kingdom

Short Abstract: Rare monogenic diseases affect millions worldwide most of which are caused by missense variants in the protein-coding regions. Differentiating these causative variants from the large number of benign variants reported through next generation sequencing is a major challenge. More accurate variant prediction in the clinic will increase the rate of patient diagnosis and more personalized treatment. In the absence of segregation and functional data, using prediction tools help to focus downstream analysis on a few suspect variants. However, inconsistency and discordance among these tools renders such phenotypic predictions unreliable. Therefore, identifying an accurate and robust approach to variant interpretation is necessary to facilitate clinical diagnosis.
Here, we introduce ProSper, a protein specific variant interpreter that combines genetic and homology modelling data to identify structural features and classify variants. ProSper outperformed REVEL and VEST4 in predicting variants in 13/21 disease-associated X-linked genes. In addition, we identified gene-specific pathogenicity thresholds which resulted in a more accurate variant prediction by REVEL and VEST4 in 10/21 genes.
We consider ProSper to form the basis of a family-specific variant classifier that can be used as a stand-alone diagnostic tool. Its accuracy and robustness can facilitate precise and timely diagnosis.

VarSAn: Variant set characterization using random walk with restart on heterogenous network
COSI: VarI COSI
  • Xiaoman Xie, University of Illinois Urbana Champaign, United States
  • Saurabh Sinha, University of Illinois at Urbana-Champaign, United States

Short Abstract: Genotype-to-phenotype studies continue to identify sets of genomic variants associated with diseases. These variants must then be interpreted mechanistically, i.e., in terms of the molecular pathways or regulatory interactions that are impacted by them. Such mechanistic interpretation can be challenging especially for the non-coding variants which are the majority of GWAS findings. In light of the formidable challenges in the field of single non-coding variant interpretation, a pragmatic related goal is to discover the system-level insights that a set of phenotypic variants point to, for example, driver genes and pathways that several variants in the set are associated with. Such insights are especially useful in studies of complex diseases where no single variant explains etiology. Here we provide a new method for this ‘SNP set characterization’ task, called ‘VarSAn’ (Variant Set Analysis), that uses graph random walk-based methods to identify mechanistic properties such as pathways relevant to a given set of variants. Instead of annotating individual variants or performing an enrichment test for a single type of annotation, our tool aggregates diverse annotations of a collection of variants, along with prior knowledge about genes and pathways, to provide systems-level insights into those variants.